101 research outputs found
Stochastic Training of Neural Networks via Successive Convex Approximations
This paper proposes a new family of algorithms for training neural networks
(NNs). These are based on recent developments in the field of non-convex
optimization, going under the general name of successive convex approximation
(SCA) techniques. The basic idea is to iteratively replace the original
(non-convex, highly dimensional) learning problem with a sequence of (strongly
convex) approximations, which are both accurate and simple to optimize.
Differently from similar ideas (e.g., quasi-Newton algorithms), the
approximations can be constructed using only first-order information of the
neural network function, in a stochastic fashion, while exploiting the overall
structure of the learning problem for a faster convergence. We discuss several
use cases, based on different choices for the loss function (e.g., squared loss
and cross-entropy loss), and for the regularization of the NN's weights. We
experiment on several medium-sized benchmark problems, and on a large-scale
dataset involving simulated physical data. The results show how the algorithm
outperforms state-of-the-art techniques, providing faster convergence to a
better minimum. Additionally, we show how the algorithm can be easily
parallelized over multiple computational units without hindering its
performance. In particular, each computational unit can optimize a tailored
surrogate function defined on a randomly assigned subset of the input
variables, whose dimension can be selected depending entirely on the available
computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and
Learning System
Learning from distributed data sources using random vector functional-link networks
One of the main characteristics in many real-world big data scenarios is their distributed nature. In a machine learning context, distributed data, together with the requirements of preserving privacy and scaling up to large networks, brings the challenge of designing fully decentralized training protocols. In this paper, we explore the problem of distributed learning when the features of every pattern are available throughout multiple agents (as is happening, for example, in a distributed database scenario). We propose an algorithm for a particular class of neural networks, known as Random Vector Functional-Link (RVFL), which is based on the Alternating Direction Method of Multipliers optimization algorithm. The proposed algorithm allows to learn an RVFL network from multiple distributed data sources, while restricting communication to the unique operation of computing a distributed average. Our experimental simulations show that the algorithm is able to achieve a generalization accuracy comparable to a fully centralized solution, while at the same time being extremely efficient
Widely Linear Kernels for Complex-Valued Kernel Activation Functions
Complex-valued neural networks (CVNNs) have been shown to be powerful
nonlinear approximators when the input data can be properly modeled in the
complex domain. One of the major challenges in scaling up CVNNs in practice is
the design of complex activation functions. Recently, we proposed a novel
framework for learning these activation functions neuron-wise in a
data-dependent fashion, based on a cheap one-dimensional kernel expansion and
the idea of kernel activation functions (KAFs). In this paper we argue that,
despite its flexibility, this framework is still limited in the class of
functions that can be modeled in the complex domain. We leverage the idea of
widely linear complex kernels to extend the formulation, allowing for a richer
expressiveness without an increase in the number of adaptable parameters. We
test the resulting model on a set of complex-valued image classification
benchmarks. Experimental results show that the resulting CVNNs can achieve
higher accuracy while at the same time converging faster.Comment: Accepted at ICASSP 201
Bidirectional deep-readout echo state networks
We propose a deep architecture for the classification of multivariate time
series. By means of a recurrent and untrained reservoir we generate a vectorial
representation that embeds temporal relationships in the data. To improve the
memorization capability, we implement a bidirectional reservoir, whose last
state captures also past dependencies in the input. We apply dimensionality
reduction to the final reservoir states to obtain compressed fixed size
representations of the time series. These are subsequently fed into a deep
feedforward network trained to perform the final classification. We test our
architecture on benchmark datasets and on a real-world use-case of blood
samples classification. Results show that our method performs better than a
standard echo state network and, at the same time, achieves results comparable
to a fully-trained recurrent network, but with a faster training
Distributed Stochastic Nonconvex Optimization and Learning based on Successive Convex Approximation
We study distributed stochastic nonconvex optimization in multi-agent
networks. We introduce a novel algorithmic framework for the distributed
minimization of the sum of the expected value of a smooth (possibly nonconvex)
function (the agents' sum-utility) plus a convex (possibly nonsmooth)
regularizer. The proposed method hinges on successive convex approximation
(SCA) techniques, leveraging dynamic consensus as a mechanism to track the
average gradient among the agents, and recursive averaging to recover the
expected gradient of the sum-utility function. Almost sure convergence to
(stationary) solutions of the nonconvex problem is established. Finally, the
method is applied to distributed stochastic training of neural networks.
Numerical results confirm the theoretical claims, and illustrate the advantages
of the proposed method with respect to other methods available in the
literature.Comment: Proceedings of 2019 Asilomar Conference on Signals, Systems, and
Computer
Pixle: a fast and effective black-box attack based on rearranging pixels
Recent research has found that neural networks are vulnerable to several types of adversarial attacks, where the input samples are modified in such a way that the model produces a wrong prediction that misclassifies the adversarial sample. In this paper we focus on black-box adversarial attacks, that can be performed without knowing the inner structure of the attacked model, nor the training procedure, and we propose a novel attack that is capable of correctly attacking a high percentage of samples by rearranging a small number of pixels within the attacked image. We demonstrate that our attack works on a large number of datasets and models, that it requires a small number of iterations, and that the distance between the original sample and the adversarial one is negligible to the human eye
Continual Learning with Invertible Generative Models
Catastrophic forgetting (CF) happens whenever a neural network overwrites
past knowledge while being trained on new tasks. Common techniques to handle CF
include regularization of the weights (using, e.g., their importance on past
tasks), and rehearsal strategies, where the network is constantly re-trained on
past data. Generative models have also been applied for the latter, in order to
have endless sources of data. In this paper, we propose a novel method that
combines the strengths of regularization and generative-based rehearsal
approaches. Our generative model consists of a normalizing flow (NF), a
probabilistic and invertible neural network, trained on the internal embeddings
of the network. By keeping a single NF throughout the training process, we show
that our memory overhead remains constant. In addition, exploiting the
invertibility of the NF, we propose a simple approach to regularize the
network's embeddings with respect to past tasks. We show that our method
performs favorably with respect to state-of-the-art approaches in the
literature, with bounded computational power and memory overheads.Comment: arXiv admin note: substantial text overlap with arXiv:2007.0244
- …